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ABSTRACT 

Concept learning provides a natural framework in which to place the problems solved 
by the quantum algorithms of Bernstein- Vazirani and Grover. By combining the tools 
used in these algorithms — quantum fast transforms and amplitude amplification — with a 
novel (in this context) tool — a solution method for geometrical optimization problems — 
we derive a general technique for quantum concept learning. We name this technique 
"Amplified Impatient Learning" and apply it to construct quantum algorithms solving two 
new problems: BATTLESHIP and MAJORITY, more efficiently than is possible classically. 
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1. Introduction 

Over the past decade increasing numbers of scientists have built quantum computation 
into an imposing edifice. The paucity of quantum algorithms, however, betrays a certain 
emptiness at its center. Only a handful of problems are known to be solvable more effi- 
ciently quantum mechanically than classically, and even fewer general quantum algorithmic 
techniques are known. The latter include quantum fast transforms [1-6] and amplitude 
amplification [7-11]. In this paper we explain how to combine these techniques with a 
new (in this context) one — a solution method for geometrical optimization problems — into 
quantum algorithms that solve new classes of problems. 

These new problems can be thought of as generalizations of the structured and un- 
structured search problems solved by the quantum algorithms of Bernstein and Vazirani 
[12] and Grover [7]. Our thinking, however, is largely informed by a branch of classical 
artificial intelligence — machine learning [13], or more specifically, computational learning 
theory [14]. 

In this subject, a concept is a map c : X — > Z2, defined on some discrete set X; the 
support of the function, c~^(l) C X, is the extension of the concept. For example, let X 
be the set of all balloons, and define c{x) = 1 if and only if x G X is red; this concept is 
"red balloon" . Concept learning is the process by which a student (the learner) identifies 
(or approximates) a target concept from a concept class C of possible concepts. Learning 
can be passive — in situations where examples x & X are presented to the student by some 
external mechanism, or active — in situations where the student can query a teacher for 
information about the target concept. In the latter case, Angluin has defined a minimally 
adequate teacher to consist of a pair of oracles: a membership oracle that responds to a 
query x & X with c{x), where c e C is the target concept; and an equivalence oracle that 
responds to a query c e C with Sec [15]. 

The number of queries made by a learning algorithm is the query complexity of the 
algorithm; the number of queries to the membership oracle is its sample complexity. These 
are distinct from the computational complexity of the algorithm, which is defined in the 
usual way [16]. A family of concept classes C^, for < z e Z, is an infinite sequence of 
concept classes defined on a corresponding sequence of sets Xi. A learning algorithm for 
such a family is a sequence of learning algorithms, one for each Cj. Since each algorithm in 
the sequence has a sample complexity, we can discuss the asymptotic sample complexity 
of the family. As we describe in detail in §2, both Bernstein and Vazirani's and Grover's 
algorithms can be interpreted as quantum algorithms for concept learning from a member- 
ship oracle, each with a sample complexity that is asymptotically smaller than the sample 
complexity of the best possible classical learning algorithm for the same problem. 

Bernstein and Vazirani's algorithm is particularly striking because it requires only a 
single query to the membership oracle to learn any concept in the problem class. Only very 
special concept learning problems have quantum sample complexity 1 in this sense. In §2 
we explain that these are learning problems in what should be described as "Hadamard" 
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concept classes. Other learning problems, like the one solved by Grover's algorithm, have 
quantum sample complexity greater than 1, but one can ask how well a student can learn 
with a single query. In §3 we pose this problem of "impatient learning" precisely, and show 
that it is answered by the solution to a certain geometric optimization problem. 

With additional queries we should expect superior results. In §4 we show that the 
quantum computing technique of amplitude amplification [7-11] corresponds to querying 
also the other half of a minimally adequate teacher, the equivalence oracle. Using an equiv- 
alence oracle we can define a general quantum learning algorithm, but without the use of 
some structure in — or symmetry of — the concept class, it is precisely Grover's algorithm, 
with the queries interpreted as being to the equivalence oracle, rather than to the mem- 
bership oracle. In §5 we review group algebras, in order to describe particular symmetries 
of concept classes. These symmetries — via quantum fast transforms — allow equivalence 
queries to be combined with optimal impatient learning algorithms to achieve performance 
superior to use of equivalence queries alone. In §6 and §7 we analyze the resulting quantum 
algorithms for concept classes with Zjv and Z2 symmetry, respectively. We obtain efficient 
quantum algorithms for two novel problems: BATTLESHIP and MAJORITY. 

We conclude in §8 with a discussion of the optimality of our quantum algorithms, 
and their relevance to a pair of conjectured upper bounds for the sample complexity of 
quantum learning algorithms. 

2. Formalization of quantum learning algorithms 

Bernstein and Vazirani's search problem is the task of identifying a G Z2, given a 
'sophisticated' oracle that returns a ■ x mod 2 when queried about x G Z2 [17]. From our 
point of view, it can also be interpreted as an instance of active learning with access to a 
membership oracle. There is a family of concept classes BV^ for < n G Z, with 

BV" = [pa : Z^ ^ Z2 I Pa{x) = a-x mod 2 for a G Z^}, 

consisting of the concepts "bit string with odd inner product with a" for a G Z2. Since 
the concept class BV^ is parameterized by a G Z2 , identifying a is equivalent to learning a 
target concept Pa by querying a membership oracle. Classically this learning problem has 
sample complexity Q{n). 

In Bernstein and Vazirani's quantum algorithm for this problem, as well as in all 
the quantum concept learning algorithms we consider in this paper, the "data structure" 
consists of a query "register" and a response "register" — the Hilbert space of states is 

I X I 2 

C ' (g> C . A membership oracle for target concept c acts via the unitary transformation 
Uc defined by linear extension from its action on the computational basis, {l^^, ^) | x G 
X,b E Z2}, namely Uc\x,b) = \x,b + c{x)), where "+" denotes addition modulo 2. Let 
|-) = (|0) - |l))/\/2. Then Uc\x)\-) = (-l)^(^) |a;) |-). We wiU use this "phase kickback" 
trick [18] throughout, so we need only concentrate on the query register and, abusing 
notation slightly, write Uc\x) = {—l)^^^^x). 
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With this notation, Bernstein and Vazirani's algorithm is summarized by the equation: 

if®"t/p,if®"|0) = \a), (2.1) 

where H = (^^ _\)/\/2 and G That is, from an initial state |0), we apply the 

Hadamard transform, H®"^; query the membership oracle; and apply the Hadamard trans- 
form again. The result is the state |a), so a measurement in the computational basis 
identifies the target concept Pa with probability 1. The quantum sample complexity of 
this algorithm is 1, a substantial improvement over the classical sample complexity. 

To understand why this algorithm works, notice that after the first Hadamard trans- 
form in (2.1), the state of the query register is in an equal superposition of all possible 
queries: 

Such an equal superposition is the state before the initial query in each of the quantum 
algorithms we discuss in this paper. Acting on this state by Up^ produces one of | = 2*^ 
possible vectors, according to the value of a. Let Asv be the matrix that has these vectors 
as columns. In general we make the following definition. 

Definition. For any concept class C defined over a set X, define the membership query- 
matrix Ac to be the \X\ x \C\ matrix with c*^ column 

for c G C. In this paper we only consider concept classes for which there is a bijection 
between X and C; we call these matched concept classes. For matched concept classes, the 
membership query matrix is square. 

For the Bernstein and Vazirani problem, the membership query matrix has entries 

{ABv)xa = ^ , 

which we recognize as the entries of H'^'^. Thus the final Hadamard transform in (2.1) 
acts as 

i?®n^Bv)a = {H®''Asv)a = {H^'^H^na = {I)a = |a), (2.2) 

since H = H~^. That is, it inverts the query matrix. Clearly, then, the sample complexity 
of quantum learning in any concept class with a unitary membership query matrix is 1. 
Since such a membership query matrix is just a Hadamard matrix in the traditional sense 
(an orthogonal matrix with entries ±1) [19,20], normalized by a/IXI, we refer to such 
concept classes as Hadamard concept classes.* 



* This nomenclature is motivated by van Dam's paper on a quantum algorithm for the quadratic 
residue problem [21]. 
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Not all learning problems, of course, are this easy. Grover's search problem can also be 
interpreted as an instance of active concept learning with access to a membership oracle. 
In this case there is a family {Qn} of concept classes, for < A*" e Z, with 

Gn = {Sa : Zat ^ Z2 I da{x) = Sax for a e Ziv}, 

which consists of the concepts "is the number a" for a G Z^r- The task is to identify 
a given an oracle that returns Sax when queried about x. Since the concept class ^at is 
parameterized by a G Zjv, identifying a is equivalent to identifying a target concept 6a- 
Classically this learning problem has sample complexity Q{N). 

Quantum mechanically, this oracle acts by a unitary matrix Ug^, so the membership 
query matrix for this problem has entries 

iAg)xa = = -^(iVFt|0)(0|F-2/)^^, (2.3) 

where F is the A^-dimensional discrete Fourier transform. Clearly Ag is not unitary, 
so Gn is not a Hadamard concept class, and a single query does not suffice to learn a 
target concept. In fact, Bernstein and Vazirani [12] showed that (in our language) the 
sample complexity of Grover's learning problem is fl{^/N). Nevertheless, one might ask 
how well it is possible to do with a single query. That is, if we can make any unitary 
transformation (independent of a) after a single query, how do we maximize the probability 
that a measurement in the computational basis {\x) | x G Zjv} returns a? We give a 
general solution to this problem of impatient learning in the next section, and then apply 
it to Grover's problem in §5. 



3. Impatient learning 

The column vectors of a membership query matrix — the possible states of the query 
register after a single equal superposition membership query — form a special case of a 
general situation we can consider, namely a quantum system whose state is one of a set of 

< G Z unit vectors | i G Zjv} in an A^-dimensional Hilbert space, H. The task is 
to select a measurement to perform that will maximize the probability of correctly guessing 
which state the system was in before the measurement was made. This is a special case of 
the problem originally considered by Helstrom [22] and Kholevo [23], quantum hypothesis 
testing, namely identifying one from among a set of pure quantum states, no matter their 
provenance. 

Recall that a von Neumann measurement [24] is defined by an orthogonal direct 
sum decomposition of the Hilbert space. The measurement is complete if the summands 
are one-dimensional. Belavkin [25] and Kennedy [26] have shown that when the 
are linearly independent the optimal quantum measurement is, in fact, a complete von 
Neumann measurement. Such a measurement determines an orthonormal basis {\ei) | 

1 G Zjv}, up to phases. The probability that the system will be in state (ej) after this 
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measurement, given that it was in state \v) before the measurement, is |(i;|ej)p. If we 
assume that the system has been prepared in one of the states {l^j)}, chosen uniformly at 
random, then the quantity we want to maximize is 

AT 

EK^^I^^)!'- (3-1) 

Necessary and sufficient criteria for solutions to this optimization problem, in the more 
general case of arbitrary prior probabilities for the can be found in the early quantum 

hypothesis testing literature [23,25,27]. In the following we provide a brief, geometrical 
derivation of such a criterion. 

We can phrase this problem as a question about matrices: If we choose an isomor- 
phism of Hilbert spaces, Ti ~ C"^, then the list (|f i), . . . , \vn)) is identified with a square 
matrix A G Mjv(C). Making an arbitrary complete measurement is equivalent to making 
an arbitrary unitary transformation, followed by a fixed complete measurement in, for ex- 
ample, the computational basis. Thus we should consider the matrices SA, for S e U{N), 
where U{N) denotes the unitary group. We write A ~ S if S = SA for some S e U{N). 
Maximizing the quantity (3.1) is equivalent to maximizing the quantity 

\\d{B)f (3.2) 

over the U{N)-oYhit of A, {B \ B ~ A}, where d : Mjv(C) M]y{C) is projection onto 
diagonal matrices and || ■ || is the (or Frobenius) norm. In the following, when we speak 
of critical points of the function (3.2), it will be implicit that the U{N)-orhit of A is the 
domain. We have the following characterization of the critical points: 

Proposition 3.1. The matrix B is a critical point of \\d{B)\\'^ if and only if Bd{By is 
Hermitian, i.e., 

Bd{B)^ = d{B)B\ 

where ^ denotes the adjoint. 

Proof Let u{N) denote the Lie algebra of U{N), i.e., the set of skew-Hermitian matrices. 
The criticality condition is that for all C € 

114(1 + tC)5)f = o, 

which is true when Re{d{Byd{CB)) = 0. But 

Re{d{B)U{CB)) ^Re{tr{CBd{B)^)), 

so the condition for B to be critical is that Bd{B)^ be orthogonal to all skew-Hermitian 
matrices, with respect to the inner product Re(tr((-)t(-))). This proves the proposition. 
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since the orthogonal complement to the space of skew-Hermitian matrices is the space of 
Hermitian matrices. | 

This result seems to have been stated and proved (differently) first by Helstrom, in 
the more general setting of an arbitrary probability distribution over the state vectors 
[27, Chap. IV, eq. (1.30)]. Since ||(i(-B)|p is invariant under left multiplication by unitary 
diagonal matrices, we can restrict our attention to those critical points of (3.2) that have 
nonnegative real entries on the diagonal. Now the criticality condition reads 

Bd{B) = d{B)BK (3.3) 



We would like, however, an explicit solution to (3.3). Consider the Gram matrix of 
A, G = A'' A, with components Gij = {vi\vj). G is a positive semi-definite Hermitian 
matrix. Let \/G denote the positive semi-definite Hermitian square root of G. By the 
polar decomposition of A, there is always a unitary matrix S such that a/G = SA, so it is 
natural to ask whether \/G is a critical point of (3.3). Proposition 3.1 shows that this is 
generally not the case. More precisely, we have the following corollary: 

Corollary 3.2. \/G is a critical point of (3.2) if and only if \/G commutes with its own 
diagonal. 

If the off-diagonal part of \/G is sufficiently general then the conclusion of Corollary 3.2 
will force the diagonal to be constant. Although this is a strong condition in general, it is 
a very natural simplification [28,23,27,25]. We shall see that it occurs in many structured 
learning problems. Moreover, having a constant diagonal is precisely the condition needed 
to go beyond impatient learning — which we will do in the next section. So it is a case 
worth considering. 

Proposition 3.3. Let G be a positive semi-definite Hermitian matrix. Let \fG denote the 
positive semi-definite Hermitian square root of G. Assume the diagonal of VG is constant. 
Let S denote the set of matrices B such that B ~ y/G and B has constant diagonal. Then 
the maximum of \\d{B)\\'^ over B e S occurs at VG. 

Proof If B has constant diagonal, then ||(i(i?)p = \tr{B)\'^/N. So it suffices to prove 
that VG gives the maximum value of |tr(S)p over all B e S. As in the proof of Propo- 
sition 3.1, the critical points occur when Re(tr(S)tr(CS)) = for all C e u(iV). Writing 
tr(S)tr(CS) = tr(CStr(S)), we see that the critical points are given by the condition 
that Bti{B) is Hermitian. Let Bh ^ {B + B^)/2 and Bg ^ {B - B^)/2. We want the 
skew-Hermitian part of Bii{B) to vanish, thus 

Bhir{Bs) + Bstr{Bh) = 0. (3.4) 

The trace of (3.4) shows that either tr(i?/i) = or tr(Bg) — 0. If both traces vanish then 
we get the minimum possible value, |tr(B)p = 0. If this is also the maximum then y/G is 
forced to vanish since it is positive semi-definite, so the statement is true in this case. If 
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only one of the traces vanishes, the maximum occurs at a point where Bg = ot B^ = 0. 
Since multiplication by i is a symmetry of |tr(i?)p, we may assume Bg = 0. Then B is 
some square root of G. The maximum of |tr(S)p will occur when one chooses the same 
sign for the square root of each eigenvalue, e.g., when B = \fG. | 

Remark. As we have noted above, both Proposition 3.1 and Proposition 3.3 have long 
been known in the context of quantum hypothesis testing. Nevertheless, we have included 
our proofs of these results in order to emphasize the connection with a similar optimization 
problem: These new proofs are inspired by the proof of the result that the minimum of 
the distance ||5 — /|| over the set B ~ \fG is given by S = irrespective of any 
assumption about the diagonal. In particular, for an arbitrary invertible matrix A, with 
polar decomposition A = S~^P, where S is unitary and P is positive definite Hermitian, 
the closest point to / in the U (A^)-orbit of A is P — this is the solution to the Procrustes 
Problem [29] . Most recently, Eldar and Forney have noted that when the diagonal of VG is 
constant, this solution to this optimization problem is also the solution to the optimization 
problem (3.1) that is the relevant one for quantum measurement [30]. 

Thus we have the following quantum algorithm for a concept learning problem with 
membership query matrix Aq: 

Impatient Learning 

1. Prepare the query register in the equal superposition state, F^O), where F is the 
I X [-dimensional discrete Fourier transform. (Any unitary map taking |0) to the 
equal superposition state works; in the case where X — Z2 , H^"^ can be applied.) 

2. Query the membership oracle, obtaining as the state the c^^ column of Ac, 
U-cF^O). 

3. Apply a unitary transformation Sc such that Be — SqAq satisfies (3.3). 

4. Measure the resulting state ScUcF'^\{}i) in the computational basis. 

As an immediate corollary of Proposition 3.1 we have: 

Theorem 3.4. Impatient Learning succeeds with probability \{Bc)cc\'^ and is optimal 
among single query quantum algorithms that begin with an equal superposition over mem- 
bership queries. 

As we saw in (2.2), for ^SV) B^v = H^'^Ajsv = I maximizes (3.3), so the Bernstein- 
Vazirani algorithm is Impatient Learning, and succeeds with probability 1 for every target 
concept. Furthermore, as we will see in §5, for Grover's problem, (3.3) is maximized by 
Bg = (2F^0) {0\F — l) Ag . Using (2.3) it is then easy to compute that the diagonal entries 

of Bg are all (3 — A/N)/\/N, so Impatient Learning succeeds with asymptotic probability 
9/N as A?" — > 00. Theorem 3.4 says that this is the best we can do using only a single 
membership query. Although it is certainly an improvement over the success probability 
1/N of random guessing. Impatient Learning is far from satisfactory for this problem. 
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4. Beyond Impatient Learning 

In fact, for most concept learning problems a single membership query simply does 
not provide enough information to learn the target concept with probability close to 1. A 
specific target concept c defines a subspace of the Hilbert space C'^', namely span{|c)} 
(recall that there is a bijection between X and C), however, so we can apply one of the few 
general quantum algorithm techniques — amplitude amplification [7-11]. This technique, 
invented by Brassard and H0yer [8] as a generalization of Grover's algorithm [7], can be 
described in terms of concepts: 

Amplitude Amplification ([11], Theorem 2). Let x be a concept over X; let Hi denote 
the subspace of C'^' spanned by the vectors labeled by the elements in the extension of x, 
X~^(l); and let H denote the projection C'^' — > Hi. For any unitary transformation W of 
C'^l, let p{W) = |nM^|0)|^ he the probability with which the state W\0) is measured to 
be in the subspace Hi- As long as p{W) > 0, we can set sin^ 9 = p{W) for < d < 7r/2. 
In this case, repeatedly applying the unitary transformation WUsqW^U^ amplifies the 
probability of measuring the state to be in the subspace Hi . More precisely, 

p{{WUsoW^U^)'^W) > max{l -p(VF),p(W^)}, 
where m — — |], ihe nearest integer to ^ — |. 

After step 3 of Impatient Learning, the state is ScUcF'^\Q), where Sq was chosen to 

maximize Xlcl (^I'^c^^c-^^ |0) | . Thus, letting x = ^c, Hi = span{|c)} and Wc = ScUcF\ 
applying Amplitude Amplification gives a new quantum algorithm: 

Amplified Impatient Learning 

1. Prepare the query register in the equal superposition state, -F^|0), where F is the 
|X|-dimensional discrete Fourier transform. (Any unitary map taking |0) to the 
equal superposition state works; in the case where X — 1]^, n®"^ can be applied.) 

2. Query the membership oracle, obtaining as the state the column of ^c, 
?7cFt|0). 

3. Apply an impatient learning transform ^c, producing ScUcF'^\^) = Wc|0). 

4. Apply WcUs^WlUs- m times, where m = — |] , and sin6' = |(c|W^c|0)| = 
\{Bc)cc\, withO< 0< f. 

5. Measure the resulting state in the computational basis. 

As a consequence of Theorem 3.4 and Amplitude Amplification we have: 

Theorem 4.1. For problems with Be having constant diagonal element s, Amplihed 
Impatient Learning succeeds with probability at least max{l — s^, s^}. Since each ofWc 
and Wq includes calls to the membership oracle via Uc, Amplified Impatient Learning has 
sample complexity 2m + 1, i.e., 0{l/s). 

Notice, however, that the algorithm uses more than membership queries. The opera- 
tion t/jg in step 4 is the action of an equivaience oracle responding to a queried concept 
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(rather than a concept argument). Roughly speaking, the Impatient Learning part of this 
algorithm maximizes the amplitude for the target concept c after a single membership 
query; then an equivalence oracle is queried about the correctness of this concept. Thus 
Amplified Impatient Learning uses both oracles comprising the minimally adequate teacher 
defined in §1, making 2m + 1 membership queries and m equivalence queries. 

At the risk of confusing the membership and equivalence oracles, we can apply Am- 
plified Impatient Learning to Grover's problem. As we noted at the end of §3 and as we 
will compute in §5, for Ag the post-membership query transform is Sg = 2F^0){0\F -I = 
F^UsqF. So for this problem, 

Wg = SgU^F^ = F^Us^FUs.F^, 

where this use of Us^ is a query to the membership oracle. The iterated transformation is 

WgUsoW^gUs-^ = iF^Us,FUs-^F^)Us„{FUs,F^UsoF)Us-^ 
= F^Us.FUs, ■ F^Us.FUs, ■ F^Us.FUs, 
= {F^Us,FUs,)^ 

where the Us^ in the first expression is the operation of the equivalence oracle but the 
distinction between the two kinds of oracles is ignored in the last expression. The complete 
algorithm is then 

{F^Us.FUs-y^^F^Us.FUs.F^lO) = (F^Us.FUsJ^'^+'F^lO), 

where m= and ^ = arcsin |(Sg)cc| = arcsin((3-4/iV)/v^). Thus m ~ jVN /3 

so the interated transformation is applied ^\/N times, asymptotically. This is, in fact, 
exactly Grover's algorithm [7], although one usually sees it factored differently (and with 
F and F^ replaced by if®"). 

5. Concept classes with group symmetry 

The sets Z2 and Zjv, over which the Bernstein- Vazirani and Grover concept classes are 
defined, are abelian groups under componentwise addition modulo 2 and addition modulo 
N, respectively. In each case the Hilbert space C becomes a ring, with multiplication law 
defined by linear extension from 

\x) * \y) — \x + y) for x,y & G, 

where G is Z2 or Zjv- 

Definition. The group algebra of G is the Hilbert space C*^ (often written C[G]), equipped 
with this ring structure. 

The regular representation of the group algebra is the map 

3 \v) ^ e M\G\{C), 
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where Ly is left multiplication by \v), a linear map on C*^, hence a \G\ x |G| complex 
matrix in the computational basis. We will identify the group algebra with its image in 
this representation. 



For G = Z2, 3 \v) 
the definition, 



= Q!|0) + is a general element of the group algebra. From 

|t,)*|0) =a|0)+/3|l) 
1^;) * |1) = a|l) + /?|0), 



so 



(5.1) 



where X = (^q) = Li is the usual Pauli matrix. More generally, we have: 



Proposition 5.1. The group algebra of Z2 consists of X 2" dimensional matrices of 
the form 



for 



\v)= J]^;,|a;)eC^^ = (C')®". 



(In the expression for a; e Z2 is a multi-index, i.e., X^ = X^^---^" = X^^ (8) • • • (8)X^".) 
\v) is the first column of L^; Ly is symmetric and has constant diagonal. Ly is diagonalized 
by the Hadamard transform. 

Proof By induction on n. That the Hadamard transform diagonalizes the elements of 
the group algebra follows from the familiar Pauli matrix identity Z = HXH, where 

The group Z^v is generated by the element 1, and for j/ e Zjv, 



1 



Li ■■ \y) 1-^ |l + y) 



Li = 



1\ 





1 0/ 



Analogously to Proposition 5.1, for Zjv we have: 



Proposition 5.2. The group algebra of Zjv consists of N x N dimensional matrices of 
the form 

Ly — v^Li, 
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where 

\v) = J2 ^cck) e C^^ = C^. 

\v) is the Grst column of Ly-, Ly need not be symmetric, but it has constant diagonal. Ly 
is diagonalized by the N -dimensional discrete Fourier transform. 

Proof. That the Fourier transform diagonalizcs the elements of the Zjv group algebra 
follows from the fact that FLiF^ = diag(l,a;,a;^, . . .,itj^~-^), where oj = ^'^'^1^ , | 

Propositions 5.1 and 5.2 allow us to characterize useful symmetries of concept classes: 

Proposition 5.3. Let C be a matched concept class over an abelian group G. Then Ac 
is in the group algebra of G if and only if it commutes with the action of G, i.e., 

LgAc = AcLg, yg e G. (5.2) 

In components (5.2) becomes 

(Ac) x+g,c+g — {Ac).c, e G; (5.3) 
equivalently c{x) is a function of c — x. 

In Grover's problem, Ag satisfies (5.2) and (5.3) for G = Zjv (and for G = Z2, when 
N = 2"^). Thus this membership query matrix belongs to the group algebra of Zjv (and 
of Z^, when = 2") and is diagonahzed by F (and by if®", when iV = 2"). It is, 
furthermore, a real symmetric matrix. The following proposition explains how to compute 
the optimal transformation S required for Impatient Learning and Amplified Impatient 
Learning in this case. 

Proposition 5.4. Let A be a real, symmetric matrix in the group algebra of Zg (or Zjv). 
Using the Spectral Theorem, define \A\ by 

\A\v = \X\v, 

for each eigenvector-eigenvalue pair {v,X) of A. Then \A\ is also an element of the same 
group algebra. Moreover, the maximum value of \\d{B)\\'^ over matrices B A with 
constant diagonal occurs at \A\. 

Proof Since A is real and symmetric, ^ is a square root of its Gram matrix. Conjugation by 
the appropriate transform (if®" or F) diagonalizes A so |^|, having the same eigenvectors, 
is also an element of the same group algebra as A. Moreover, | A| is the positive semi-definite 
square root of the Gram matrix. Thus the result follows from Proposition 3.3. | 

Proposition 5.4, applied to a symmetric membership query matrix Ac satisfying the 
conditions of Propostion 5.3, implies that an optimal unitary transformation Sc in the 
Impatient Learning and Amplified Impatient Learning algorithms satisfies 

\Ac\ = ScAc. (5.4) 



12 



Geometry of quantum learning 



Hunziker, Meyer, Park, Pommersheim & Rothstein 



When Ac is nonsingular, Sc is unique and (5.4) implies that 



Sc = \Ac\Ac^ =: sign(Ac), 



where 



siga{A)v = sign(A)v = ttt'^^; 



for all eigenvector-eigenvalue pairs {v,X) of A. 



To compute Sg for Grover's concept class we diagonalize Ag: 




21) 



1 



(Ar|o)(o| 



21). 



This implies that 



Sg = sign(Ag) = Ftdiag(l, -1, 



1)F 



Ft(2|0)(0| 



l)F = -F^Us,F, 



which is the promised expression for Sg that we quoted in §3 and §4. 

6. Learning problems with cyclic symmetry 

Although recognizing Grover's algorithm as an instance of Amplified Impatient Learn- 
ing perhaps contributes to a better understanding of this basic quantum algorithm, we 
would like to apply the general formalism developed in the preceding sections to derive 
new quantum algorithms. So in this section we consider some new problems with cyclic 
symmetry. 

According to Proposition 5.3, any learning problem with a transitive Zjv action has 
the property that the oracle response c{x) depends only on the difference c — x mod N. 
Thus we may write c{x) = (f){c—x) for some function : Zjv — > ^2, whence the membership 
query matrix Aq is 



By Proposition 5.2, Ac is diagonalized by the Fourier transform; hence its eigenvalues are 



for j G Ztv- By Proposition 5.4, the relevant quantity for Impatient Learning is the size 
of the diagonal elements of \Ac\. This matrix is in the Z^r group algebra, and hence 
has constant diagonal. Furthermore, the diagonal element s is just the average of the 
eigenvalues of |^c|, i.e., the average of the absolute values of the eigenvalues of Ac. By 
Theorem 4.1, therefore. Amplified Impatient Learning requires 0(1/ s) queries. 




(6.1) 




(6.2) 
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Consider a special class of cyclically symmetric problems, which we call BATTLESHIP, 
after the Milton Bradley game with the same name. Let < r < N/2. For any a,x e Zjv, 
set 

,/x fl ifa — a; = — r, . . . , r mod N: 



otherwise. 

d = 2r + 1 is the length of the battleship, i.e., d counts the number of a; e Z^r that satisfy 
ba{x) = 1 for any fixed a. 

It turns out that the behavior of BATTLESHIP problems depends on the relative size 
of d with respect to A^. Thus we consider two separate subfamilies of BATTLESHIP: For 
the problem SMALLSHlP(d), we fix the value of d and let be arbitrary. For the problem 
Bigship(q:), we again let N be arbitrary, but fix the ratio a G (0, 1/2) of d to N. That is, 
we take d = [aN] . 

Theorem 6.1. For any fixed d, Amplified Impatient Learning solves SMALLSHIP((i) with 
0{\/N) queries, which is optimal to within a constant factor. When applied to BiGSHIP(q;), 
however, Amplified Impatient Learning requires Q{\/N / log A^) queries, which is far from 
optimal. 

Proof The eigenvalues of Abs for the BATTLESHIP concept class with parameters N and 
r are 



^ r N-i — 1 



k=r+l 



In particular, this gives 
while for j > 0, 



X,= ^{N-2d), 
V iV 



^ _ 2 sin(.,d/iV) 
' y/N sm{TTj/N) ' 



First, consider the case of SMALLSHlP((i) for fixed d. Since the expression sm{7Tj /N) 
in the denominator of (6.3) is bounded above in absolute value by 1, it follows that 

/— 2 v-^ I . njd I 
sV N > — > sm— — . 

i=i 

As N tends to infinity, the right hand side approaches the constant value 2 | sinci7ra;|da;. 
We conclude that s = and hence Amplified Impatient Learning has query com- 

plexity 0{\fN). To see that this is optimal to within a constant factor, note that for each 
a there are only d values of x for which ha{x) = 1. Thus any classical learning algorithm 
which uses only the membership oracle requires il{N/d) queries. Note also that the equiv- 
alence oracle can be simulated using exactly 2 calls to the membership oracle, since c = ba 
if and only if c{a + r) = 1 and c{a + r + l) = 0. Hence the equivalence oracle is unnecessary. 
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and the results of Servedio and Gortler [31] imply that we can achieve at most a quadratic 
speedup over the classical algorithm. Thus fl{y^N/d) quantum queries are required. Since 
d is constant, we see that for SMALLSHlP((i) Amplified Impatient Learning is optimal up 
to a constant factor. 

Second, consider BiGSHIP(q;) for fixed a. In this case we claim that s = O ((log N) / -\/]V) • 
To see this, note first that Xq/N is 0{1/^/N). Bounding each of the sines in the numerator 
of (6.3) by 1, we find that for j > 0, Xj/N is bounded above in absolute value: 

1 1 7rj 

— \Xj\ < =csc— . 

It follows that Xi/N and Xn-i/N are 0(l/\/iV), while the remaining sum 

-T7 \Xi\ < ;= CSC < — = / cscTTxda; = o( ). 

Thus the number of steps required by Amplified Impatient Learning is {1{\/N j log A). To 
see that this is not an optimal algorithm, consider using Grover's algorithm to return some 
X for which ha{x) = 1. This requires 0{y^N/d) quantum queries [32], and narrows the 
possible answer space to a set of size d. A classical binary search, requiring logd further 
(classical) queries can now be used to identify the answer a uniquely. This alternative 
algorithm solves BlGSHIP(ct) with only 0{^/N/d+logd) = 0{-\/l/a + logaN) = 0(log A) 
queries, far fewer than the ^l{^/N /log A) required by Amplified Impatient Learning. | 



7. The Majority problem 

The other group algebra introduced in §5 is that of Z2 . In this section we study a 
novel problem, MAJORITY, that has this symmetry. Fix a positive integer n, and for each 
a e Z2 define a function nia : Z2 ^ by 

^a{x) = {l ifwt(a-a;)<n/2; 
1 otherwise. 

That is, ma{x) = 1 when the bit strings a and x agree in at least as many bits as they 
disagree. The MAJORITY concept class A4AJ'"' is defined to be the set of all functions 
nta, where a is any element of It is easy to see that any classical learning algorithm 
requires at least n queries. We can do better quantum mechanically: 

Theorem 7.1. Amplified Impatient Learning solves MAJORITY with 0{-s/n) quantum 
queries, given access to both the membership oracle and the equivalence oracle. 

If n is an odd integer, the membership query matrix for AiAJ^"^ contains in its upper 
left-hand corner (rows and columns labeled ... to 1 ... 1 top-to-bottom and left-to- right, 
respectively) a 2"^"^ x 2^~^ submatrix proportional to the membership query matrix for 
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MAJ^~^ . Furthermore, if a, 6 e Z2 are complementary bit strings then ma{x) = l—mb{x) 
for all X, and hence the column in the membership query matrix corresponding to a 
equals the negative of the column corresponding to h. It follows that if one can learn 
a concept from M.AJ^~^ , then one can learn a concept from M.AJ^ with one additional 
membership query. Thus, in what follows, we will assume that n is an even integer. 

For learning problems with Z2 symmetry we have the following analogues of (6.1) and 
(6.2): Since c{x) — 4){c — x) for some : Z2 — >^ Z2, the membership query matrix has the 
form 

= 4^ E (-l)''^'^^'- (7.1) 

By Proposition 5.1, Ac is diagonalized by the Hadamard transform; hence its eigenvalues 
are 

for c e Z2 . With these preliminaries in place, we can prove Theorem 7.1: 

Proof. For the concept class AiAJ^^, 4>{b) = ©(f — wt(6)), where the Heaviside function 
Q{z) = 1 if 2; > 0; and vanishes otherwise. It is easy to see in (7.2) that for this problem 
the value of Ac depends only on k — wt(c). Thus, for e {0, . . . , n}, we may set Xn,k = Ac, 
where c G Z2 is any bit string of weight k. To calculate Xn,kj we consider the string 
c = 0'^~'^l'^, which has weight k. For any b G Z2, let s denote the number of Is in the 
first n — k bits of b, and let r denote the number of Is in the remaining k bits of b. Then 
6 • c = r, and 0(6) = ©(^ — (r + s)). Since the number of bit strings b with given values 
for r and s is {'^~''){'^), we have 

v. = E(";*)(*)(-i)*"*-""*(-ir. 

r,s ^ / \ / 

Using standard combinatorial techniques, this sum simplifies to give 

f (-l)"^' /- n \ l-3-(fc-l) for pvon- 

A„,fe = \ -72^ in/si (n-l).(n-3)...(n-fc+l) ^ ^V^^' 

[ K,k-i for k odd. 

It follows that for any even number n, the eigenvalue of smallest absolute value is the 
middle eigenvalue Xn,n/2j which is given by 



I A 



_ 2" \n/4 

n,n/2\ — S "2 ({n-2)l2 



^C"/^) ifn = 0mod4: 

/2" Vn/4/ 



-^( 9^/2) ifn = 2mod4. 

V2" V(n-2)/4/ 



Since, by Stirling's formula, each of these expressions is asymptotic to 1/ y^, we find that 
the average s of the absolute values of the eigenvalues of Aj^j^^jn is ^lilj^Jn). Thus the 
quantum query complexity of MAJORITY is 0(l/s) = 0(-\/^), as claimed. | 
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8. Conclusion 

In this paper we have derived a general technique — Amphfied Impatient Learning — for 
quantum concept learning from a minimally adequate teacher. We applied it to two novel 
problems: BATTLESHIP and MAJORITY, that like the problems of Bernstein- Vazirani and 
Grover, can be recognized as concept learning problems. Amplified Impatient Learning 
solves SMALLSHlP((i) with O(v^) queries, an improvement over the Q{N) queries required 
classically. For Bigship(q!), Amplified Impatient Learning is not so good, but we gave 
an alternative quantum algorithm with sample complexity O(logA^). Finally, Amplified 
Impatient Learning solves MAJORITY with 0{^/n) quantum queries, again an improvement 
over the Cl{n) required classically. 

Quantum algorithms for concept learning were first considered by Bshouty and Jack- 
son [33] , who analyzed the traditional DNF learning problem [34] . Subsequently, Servedio 
and Gortler proved some general lower bounds on the quantum sample complexity of learn- 
ing from any membership oracle [31]. Their results, together with algorithms derived in 
this paper, motivate us to make a pair of conjectures about general upper bounds on the 
quantum sample complexity of learning from a membership oracle: 

Conjecture 1. For any family of concept classes {Ci} with \Ci\ oo, there exists a 
quantum learning algorithm with membership oracle query complexity 0(^^\Ci\). 

Our quantum algorithm for SMALLSHIP(ci) (which specializes to Grover 's algorithm 

when d = 1) saturates this bound; the idea is that these minimally structured search 
problems are concept learning problems that are as difficult as any of the same size. As we 
noted in the proof of Theorem 6.1, the calls to the equivalence oracle in this problem can be 
replaced by calls to the membership oracle, so our results are consistent with Conjecture 1. 

The difficulty of concept learning problems depends on more than the number of 
concepts \C\ among which the target concept lies, however; it also depends on how similar 
distinct concepts are. Servedio and Gortler express their lower bounds in terms of a 
quantity 7c that measures this similarity: For any C C C, define ^ = {c e C | c{x) = b}. 
Then 

\C' I 

7c = mm max mm , . 

C'cc,\C'\>2 xEX bei.2 \C\ 

7c is small if there is a large subset C from which the response to any query to the 
membership oracle rules out only a small fraction of the concepts. In this case we expect 
the concept class to be difficult to learn. (Since C can contain only 3 concepts, from which 
a query might eliminate only 1, 7c cannot be greater than 1/3.) 

Conjecture 2. For any family of concept classes {Ci} with \Ci\ — > 00, there exists a 
quantum learning algorithm with membership oracle query complexity 0(^^^). 

The problems MAJORITY, SMALLSHIP(ci), and Bigship(q!) studied in this paper pro- 
vide examples of learning problems that satisfy the bounds given in the above conjec- 
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tures. For the BATTLESHIP problem, one calculates that 'jss = niin{(i/A^, 1/3}. Thus, for 
SMALLSHlP((i), whose quantum sample complexity is 0(\/iV), Conjecture 1 is sharp, while 
Conjecture 2 provides the weaker bound of 0{\/N log N). For BlGSHlP(a), the situation is 
reversed: Conjecture 2 provides a sharp upper bound of 0(log A^), while Conjecture 1 gives 
the weaker upper bound of 0{VN). For MAJORITY, whose quantum sample complexity 
is at most logA^, it is easy to see that Conjecture 2 holds by using the fact that 7 < 1/3. 
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